An Irish speech synthesiser

نویسندگان

  • Laure Charonnat
  • G. Ó-Néill
  • Guy Mercier
چکیده

In the context of the realisation of a Text-To-Speech[1] system for Irish[2], a new algorithm for speech synthesis has been developed. This algorithm, which achieves synthesis by concatenation of diphones, is based principally on two classical signal processing techniques: the linear prediction and the Overlap and Add (OLA). Unlike the well-known TD-PSOLA method, no pitch marking is required; instead, the recorded segments are modified in order to produce pitch constant signals. Thus, the OLA procedures are applied to broad windows especially during concatenation, enabling a spectral smoothing of the transition between the diphones. An initial pitch modification, energy equalisation and, if necessary, a lengthening of the shorter sounds are carried out. The actual synthesis then consists of two modules: concatenation and prosody matching, including pitch and duration modification. The pitch modification (both in the initialisation stage and in the prosody matching) is realised through a linear prediction analysis of the signal, producing estimates of the vocal tract filter and the glottal signal. In order to modify the pitch without changing the formant frequencies, an interpolation (or decimation) is applied to each period of the glottal signal according to the required pitch modification rate. The duration modification is based on the time-scale modification algorithm proposed by Roucos and Wilgus [3], called the Synchronous Overlap and Add algorithm. The method and the computation of its parameters have been optimised, producing a very high quality time-scale modification. Finally, the concatenation module consists of overlapping the common phoneme of the two diphones being concatenated. A computation of their cross correlation allows us to synchronise them avoiding phase mismatch. The constant pitch allows a large overlap of the signals. Before their addition, two half hamming windows (the first one is decreasing and the second one is increasing) are applied to the signals to generate a smooth spectral transition. The algorithm has been tested on Irish sentences. The diphones have been extracted from a corpus recorded by an Irish speaker, trying hard to keep a constant pitch during the pronunciation to facilitate the initial pitch modification. The prosody of the sentence have been defined from a reference pronunciation of the same sentence. The synthesised sentence is fairly clear with some degree of naturalness.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On evaluating synthesised visual speech

This paper describes issues relating to the subjective evaluation of synthesised visual speech. Two approaches to synthesis are compared: a text-driven synthesiser and a speech-driven synthesiser. Both synthesisers are trained using the same data and both use the same model for rendering the synthesised visual speech. Naturalness is used as a performance metric, and the naturalness of real visu...

متن کامل

Using same-language machine translation to create alternative target sequences for text-to-speech synthesis

Modern speech synthesis systems attempt to produce speech utterances from an open domain of words. In some situations, the synthesiser will not have the appropriate units to pronounce some words or phrases accurately but it still must attempt to pronounce them. This paper presents a hybrid machine translation and unit selection speech synthesis system. The machine translation system was trained...

متن کامل

Development of an emotional speech synthesiser in Spanish

Currently, an essential point in speech synthesis is the addressing of the variability of human speech. One of the main sources of this diversity is the emotional state of the speaker. Most of the recent work in this area has been focused on the prosodic aspects of speech and on rule-based formantsynthesis experiments. Even when adopting an improved voice source, we cannot achieve a smiling hap...

متن کامل

An HMM-based speech synthesiser using glottal post-filtering

Control over voice quality, e.g. breathy and tense voice, is important for speech synthesis applications. For example, transformations can be used to modify aspects of the voice related to speaker’s identity and to improve expressiveness. However, it is hard to modify voice characteristics of the synthetic speech, without degrading speech quality. State-of-the-art statistical speech synthesiser...

متن کامل

The Accurate Estimation of Articulatory Synthesiser Parameters through Reducing the Degree of Saturation

A new method is proposed to correctly estimate the parameters of an articulatory speech synthesiser using a MLP neural network. This is achieved through modifying the statistical characteristic of the acoustic input pattern vectors in order to prevent the activation level of the hidden nodes from approaching saturation. The technique results in considerably faster neural learning and a more acc...

متن کامل

Factors Influencing Vocal Pitch in Articulatory Speech Synthesis: A Study Using PRAAT

An extensive study on the parameters influencing the pitch of a standard speaker in articulatory speech synthesis is presented. The speech synthesiser used is the articulatory synthesiser in PRAAT. Categorically, the repercussion of two parameters: Lungs and Cricothyroid on the average pitch of the synthesised sounds is studied. Statistical analysis of synthesis data proclaim the extent to whic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998